首页> 外文OA文献 >On the Performance of Latent Semantic Indexing-based Information Retrieval
【2h】

On the Performance of Latent Semantic Indexing-based Information Retrieval

机译:基于潜在语义索引的信息检索性能

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Conventional vector based Information Retrieval (IR) models, Vector Space Model (VSM) and Generalized Vector Space Model (GVSM), represents documents and queries as vectors in a multidimensional space. This high dimensional data places great demands for computing resources. To overcome these problems, Latent Semantic Indexing (LSI): a variant of VSM, projects the documents into a lower dimensional space, computed via Singular Value Decomposition. It is stated in IR literature that LSI model is 30% more effective than classical VSM models. However statistical significance tests are required to evaluate the reliability of such comparisons. But to the best of our knowledge significance of performance of LSI model is not analyzed so far. Focus of this paper is to address this issue. We discuss the tradeoffs of VSM, GVSM and LSI and empirically evaluate the difference in performance on four testing document collections. Then we analyze the statistical significance of these performance differences.
机译:常规的基于向量的信息检索(IR)模型,向量空间模型(VSM)和广义向量空间模型(GVSM),将文档和查询表示为多维空间中的向量。这种高维数据对计算资源提出了很高的要求。为了克服这些问题,潜在语义索引(LSI):VSM的一种变体,将文档投影到通过奇异值分解计算的较低维空间中。在IR文献中指出,LSI模型比经典VSM模型有效30%。但是,需要进行统计显着性检验才能评估此类比较的可靠性。但是,就我们所知,到目前为止,尚未分析LSI模型性能的重要性。本文的重点是解决这个问题。我们讨论了VSM,GVSM和LSI的权衡,并根据经验评估了四个测试文档集的性能差异。然后,我们分析这些性能差异的统计意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号